Clustering huge protein sequence sets in linear time

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Projected Clustering for Huge Data Sets in MapReduce

Fast growing data sets with a very high number of attributes become a common situation in social, industry and scientific areas. A meaningful analysis of these data sets requires sophisticated data mining techniques as projected clustering that are able to deal with such complex data. In this work, we investigate solutions for extending the state-of-theart projected clustering algorithm P3C for...

متن کامل

Clustering huge data sets for parametric PET imaging.

A new preprocessing clustering technique for quantification of kinetic PET data is presented. A two-stage clustering process, which combines a precluster and a classic hierarchical cluster analysis, provides data which are clustered according to a distance measure between time activity curves (TACs). The resulting clustered mean TACs can be used directly for estimation of kinetic parameters at ...

متن کامل

Clustering sequence sets for motif discovery

Most of existing methods for DNA motif discovery consider only a single set of sequences to find an over-represented motif. In contrast, we consider multiple sets of sequences where we group sets associated with the same motif into a cluster, assuming that each set involves a single motif. Clustering sets of sequences yields clusters of coherent motifs, improving signal-to-noise ratio or enabli...

متن کامل

Exact clustering in linear time

The time complexity of data clustering has been viewed as fundamentally quadratic, slowing with the number of data items, as each item is compared for similarity to preceding items. Clustering of large data sets has been infeasible without resorting to probabilistic methods or to capping the number of clusters. Here we introduce MIMOSA, a novel class of algorithms which achieve linear-time comp...

متن کامل

Exact Subspace Clustering in Linear Time

Subspace clustering is an important unsupervised learning problem with wide applications in computer vision and data analysis. However, the state-of-the-art methods for this problem suffer from high time complexity—quadratic or cubic in n (the number of data instances). In this paper we exploit a data selection algorithm to speedup computation and the robust principal component analysis to stre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Nature Communications

سال: 2018

ISSN: 2041-1723

DOI: 10.1038/s41467-018-04964-5